Remaining useful lifetime estimation for discrete power electronic devices using physics-informed neural network

Estimation of Remaining Useful Lifetime (RUL) of discrete power electronics is important to enable predictive maintenance and ensure system safety. Conventional data-driven approaches using neural networks have been applied to address this challenge. However, due to ignoring the physical properties of the target RUL function, neural networks can result in unreasonable RUL estimates such as going upwards and wrong endings. In the paper, we apply the fundamental principle of Physics-Informed Neural Network (PINN) to enhance Recurrent Neural Network (RNN) based RUL estimation methods. Through formulating proper constraints into the loss function of neural networks, we demonstrate in our experiments with the NASA IGBT dataset that PINN can make the neural networks trained more realistically and thus achieve performance improvements in estimation error and coefficient of determination. Compared to the baseline vanilla RNN, our physics-informed RNN can improve Mean Squared Error (MSE) of out-of-sample estimation on average by 24.7% in training and by 51.3% in testing; Compared to the baseline Long Short Term Memory (LSTM, a variant of RNN), our physics-informed LSTM can improve MSE of out-of-sample estimation on average by 15.3% in training and 13.9% in testing.

www.nature.com/scientificreports/ estimate the device RUL, as prior studies 2,5,33 did. Informally, the RUL estimation problem aims to find a mapping relationship from a collector-emitter voltage V ce series to its corresponding RUL series. We can formulate the problem as follows.
Given is a collector-emitter voltage series {V ce (t)} , where V ce (t) ∈ R , t ∈ [0, N f ] ; t is in (test) cycle, and N f is the lifetime of the device. When t ≥ N f , the device fails. The goal of RUL estimation is to find a mapping function F such that V ce (t) →RUL(t) , i.e., R UL(t) = F (V ce (t)) , where R UL(t) ∈ N , t ∈ [0, N f ] , is the estimated RUL series.
The RUL estimation problem can be treated as a supervised learning problem, meaning that there is a target RUL series RUL(t). As of now, the commonly used target RUL function RUL(t) is a simple linear function, starting from a normalized value of 1 down to 0. It can be mathematically expressed as follows.
In this equation, the exact failure time, N f , is unknown and thus it is impossible to determine the end of lifetime (RUL(t = N f ) = 0) . However, in our RUL estimation, we consider a relative (not absolute) lifetime. In Eq. 1, the term t/N f is a fraction of total lifetime N f . Then we can derive the following two physical rules or properties directly from Eq. (1): 1. Monotonic decreasing condition: There is a monotonously decreasing linear relationship between t and RUL(t), even though the degradation rate 1/N f may vary from device to device. 2. Boundary condition: Apparently, RUL(0) = 1 meaning that the device has a full lifetime (100%) at the beginning t = 0 , and RUL(N f ) = 0 meaning that the device fails when t = N f .
Conventional neural network training will ignore such conditions since they are not embedded into the loss function of the neural network. In our work, we will utilize the two conditions to formulate two PINN constraints into the loss function.

RUL estimation method using RNN
To better handle sequence data, the neural network designed for the RUL estimation is a many-to-one type RNN, as shown in Fig. 1. There are three layers in the designed neural network. The first layer, which is a recurrent layer, has 80 neurons; the second layer and the third layer are fully connected layers with 10 neurons and one neuron, respectively. The recurrent layer can be unfolded into s time steps. In our model, s is set to 10. This means that the network takes in 10 continuous values of one input feature and produces one output. The mathematical equations of the constructed RNN structure are given for the first layer in Eq. (2), for the second layer in Eq. (3), and for the third layer in Eq. (4). For each pair of input [X t : X t+s−1 ] and output Y t+s−1 , t, s ∈ N , we have where h t is the first layer output, i.e., hidden state vector, at time t; h t−1 is the hidden state vector at time t − 1 ; W h is the weights of the hidden state vector, i.e., weights between h t and h t−1 ; W x is the weights between the input and the hidden state vector; X t is the input at time t, and b h is the bias vector of this layer. The activation function is the hyperbolic tangent function tanh(). (1) (3) f 1 t+s−1 = tanh(W y · h t+s−1 + b y ), Figure 1. Structure of the designed RNN: The first layer, which is a recurrent layer, has 80 neurons; the second layer and the third layer are fully connected layers with 10 neurons and one neuron, respectively; and the time step s is set to 10. www.nature.com/scientificreports/ where f 1 t+s−1 is the second layer output at time t + s − 1 , where s = 10 is the time step of RNN; W y is the weights between the first layer and the second layer; b y is the bias vector added in this layer. The activation function is the hyperbolic tangent function tanh().
where Y t+s−1 is the output of the model, i.e., the output of the third layer f 2 t+s−1 ; W z is the weights between the second layer and the third layer; b z is the bias vector of the third layer. The activation function is the linear function.
Since RUL estimation is a regression problem, we use Mean Squared Error (MSE) to measure the loss during network training. Let E residual = Y −Ŷ be the difference between the labeled value Y (ground truth) and predicted value Ŷ . The loss function of the RNN, E RNN , can be written in the form of ordinary least squares as follows.
where n is the number of training samples.

RUL estimation method with PINN
After introducing the baseline RNN for RUL estimation and its loss function, we present our loss function with physics-informed regularization, E PINN , which is defined as follows.
where Y i denotes the ground truth of normalized RUL; Ŷ i denotes the predicted RUL; α , β and γ are hyperparameters to control the weights of three constraints; ReLU is the rectified linear activation function, ReLU(x) = max(0, x) . The loss function contains three parts: The first part is the error between the prediction and the label; the second part is the monotonic decreasing constraint, and the third part is the boundary constraint.
• Ordinary least squares (OLS). OLS is the common least squares measure for minimizing E residual = Y −Ŷ , which is the distance between the labeled value Y and predicted value Ŷ . The loss is the mean of squared differences between them and can be defined as This part is the same as that for RNN, Eq. (5). • Monotonic decreasing constraint (MDC). In RUL estimation, one physical rule is that the RUL should only decrease over time. The previous predicted RUL Ŷ i−1 should be larger than the current predicted RUL Ŷ i , and the loss would otherwise be Ŷ i −Ŷ i−1 . Mathematically this can be written as follows.
which can be conveniently expressed as ReLU(Ŷ i −Ŷ i−1 ) . We use this formulation to make sure it contributes to the error only when Ŷ i is larger than Ŷ i−1 . The MDC loss can thus be defined as follows.
• Boundary condition constraint (BCC). The boundary condition for normalized RUL is Ŷ i ∈ [0, 1] . The error occurs only when the estimates go beyond the boundary conditions. For each predicted Ŷ i , the error due to violating the boundary conditions can be written as follows.
(4) www.nature.com/scientificreports/ which can be concisely expressed as ReLU(−Ŷ i ) + ReLU(Ŷ i − 1) . We use this formulation to capture the BCC loss, which can thus be defined as follows.
Now we have three components constituting the customized loss function for PINN. OLS is responsible for minimizing the distance between the predicted and labeled RUL, MDC is to enhance the decreasing trend of the predicted RUL curve, and BCC punishes the predicted value exceeding the boundaries. Instead of simply combining them, we introduce weights ( α , γ , β ) to the three terms to control their influence on the total loss. Briefly, we can write Eq. (6) as follows.
The purposes of the three parameters and their tuning principles are explained as follows.
• α is used to proportionally balance the error contributions between OLS (the residual error) and MDC. When we tweak α , the contributions of OLS and MDC change with the same proportion. If we increase α , the contribution of OLS will decrease while the contribution of MDC will increase. • γ is introduced to set the loss values of OLS and MDC on the same scale, such that we can jointly control the two parts in proportion. • β is set to control the weight of BCC to the total error. It is not tuned in proportion to OLS and MDC, because the BCC contributes to the total error only when the estimation results go across the boundary conditions (larger than 1, less than 0). • If both α and β are equal to 0, the loss function represents the case for the baseline neural network without constraints or physical rules inserted.
By tuning the three parameters, we can flexibly control the proportions of the three error terms while minimizing the total estimation error. We would note that (1) Physics-Informed Neural Network (PINN) is not an independent NN but a technique that utilizes physical rules to strengthen the underlying NN. It is built on top of an underlying NN. PINN is a general term applicable to all kinds of underlying NNs. Depending on the underlying NN, it may be precisely termed PI-RNN (Physics-Informed RNN), if the underlying NN is RNN; it may be precisely termed PI-LSTM (Physics-Informed LSTM), if the underlying NN is LSTM. The loss function, E PINN , is a general formulation for RUL estimation. It is not bound to a particular type of NN. In the next section, we apply this formulation to vanilla RNN and LSTM, leading to PI-RNN and PI-LSTM, respectively. (2) The spirit of PINN is to regularize the underlying NN through physical rules associated with the problem under study. This is done by adding additional terms in the NN's loss function so that the learning algorithm can produce outputs that are more reasonable. The original PINN was developed to solve problems with Partial Differential Equation (PDE) based physical rules, but the spirit of PINN is regularization, i.e., formulating soft constraints into the loss function based on physical rules. The physical rules may be represented by PDEs, and might not be able to be represented by PDEs. While the original PINN is limited to the former, our work expands its scope to cover the latter. As such, our work follows the spirit of PINN and makes the original PINN more generalized.

Results and discussion
Experimental setup. In our experiments, we evaluate the performance of our PINN-based RUL estimation against RNN-based RUL estimation. We use the full IGBT degradation aging dataset from NASA 38 . When applying our PINN formulation, we consider two types of baseline underlying RNNs: vanilla RNN and LSTM. We will detail both in-sample and out-of-sample estimation performance for vanilla RNN, and report out-ofsample estimation performance for LSTM.
• In-sample estimation: The training data and testing data use samples from the same device or the whole set of devices. The purpose is to evaluate the model's learning performance. • Out-of-sample estimation: The training data and testing data use samples from different devices. The purpose is to evaluate the model's generalization performance. In the meanwhile, we look into how the two PINN physical constraints influence the model's learning and performance.
For network training, we employed the well-known Adaptive moment estimation (Adam) algorithm 40 , which is an improved version of stochastic gradient descent optimization algorithm. As evaluation metrics, we use both Mean Squared Error (MSE) and coefficient of determination called R 2 score, which are commonly used criteria to measure the performance of regression problems. R 2 score is a statistic that provides another measure of goodness of fit. It is the proportion of variance in the dependent variable that is explained by the model. It is defined as follows. www.nature.com/scientificreports/ where Y i denotes the ground truth of RUL; Ŷ i denotes the predicted RUL; Ȳ is the mean value of Y i ; n is the number of samples. It can measure the proportion of the variation as a percentage which makes it easier to compare different models. The best score is 1.0 indicating the predicted values and labels are perfectly matched. The score is 0 if Ŷ i =Ȳ meaning that the model returns a constant estimate equal to the mean value of labeled true values. If the model is worse than that, it would be negative. Since we use α to balance the losses between ordinary and monotonic decreasing errors, we need to make sure that their error contributions are on the same scale. In the experiment, we added a weight of 0.1 to the monotonic decreasing constraint. This means that γ = 0.1 in Eq. (6).
The NASA dataset and pre-processing. The NASA IGBT dataset. The IGBT dataset is an open-source dataset from NASA Prognostics Center of Excellence (PCoE) Data Set Repository 38 . The type of device is International Rectifier IRG4BC30KD IGBT with 600V/15A current rating in TO220 package . The data were collected from an IGBT thermal overstress experiment, where a square signal was applied at the IGBT gate and parameters like gate-emitter voltage ( V ge ), collector-emitter voltage ( V ce ), and collector-emitter current ( I ce ) were recorded 39 .
The failure mode is transistor latch-up (not package failure). The latch-up failure leads to a high current between the collector and the emitter, which can be captured by the drastic drop in the collector-emitter voltage ( V ce ). The latch-up failure itself will not cause immediate damage to the IGBT; it is the latch-up caused thermal runaway that will damage the device. However, in the experiment 39 , a temperature threshold controller was used to prevent this damage from happening by turning off the load power supply to terminate the test once the thermal runaway (temperature exceeding threshold) occurred. In this way, the device can still be functional after the latch-up failure point but the failure mechanism was simulated.
As with previous studies 2,5,33 , we consider the collector-emitter voltage ( V ce ) as the precursor signal. We regarded the abrupt drop in collector-emitter voltage ( V ce ) as the device failure point. Only four devices were given in the dataset, starting from device 2 to device 5. Please note that we keep the same device numbering in the paper as the original dataset.
Data preprocessing. Referring to the data acquisition experiment 39 , we identified the failure points of four devices from the precursor signal ( V ce ) and cut off data after the devices failed. The original V ce signals of all four devices are visualized in Fig. 2. We used the following three steps to preprocess the data set.
Average downsampling Since the square signal was applied at the gate of the IGBT device in the experiment, the collector-emitter voltage was in the form of square wave as well. Therefore, we downsampled the raw data to one sample in one square wave cycle by calculating the average value of this cycle.
Standardization A zero mean ( µ = 0 ) and unit standard deviation ( σ = 1 ) were used to standardize the downsampled dataset so that the prediction did not depend on the exact data values. www.nature.com/scientificreports/ Window smoothing Exponential Moving Average (EMA) algorithm 41 was applied to smooth the standardized data, facilitating the neural network to learn and fit. In contrast to Simple Moving Average (SMA) algorithm, EMA puts more weight on the most recent data points. The EMA function is given as follows: where x t denotes the original series; y t denotes the smoothed series; θ denotes the decay factor given by θ = 2/(span + 1) , where span is set to 15 as the width of the sliding window applied to the original series.
As an example, Fig. 3A-D shows the original data, after the average down-sampling, after the standardization, and after the window smoothing, respectively. The smoothed dataset for all four devices is drawn in Fig. 4.

In-sample estimation performance: RNN versus physics-informed RNN (PI-RNN).
In the insample experiment, we trained RNN models for each individual device and four devices as a whole, with 80% of data for training and 20% of data for testing. Since our monotonic decreasing condition needs the information about previously predicted RUL to calculate the loss, we should keep the sequence order of data samples. To this end, for every 5 samples, the last one was extracted and they were concatenated as test data as illustrated in Fig. 5. Table 1 and Fig. 6 compare the in-sample performance of RNN and PI-RNN with α = 0.1 and β = 1 . We can see that PI-RNN achieves a better MSE performance than RNN in both training and testing with 38.86% and 35.69% improvements, respectively. PI-RNN has the most significant MSE improvement on Device 3, with 74.4% for training and 78.4% for testing. The minimum MSE performance improvement appears on Device 2, with 20.56% for training and 12.79% for testing. Compared with other devices, Device 2 has a much larger error when training and testing with PI-RNN and RNN. This is due to the vague V ce feature of Device 2 in the second half of its RUL (see Fig. 4). With all devices as a whole, PI-RNN reduces the training error by 45% and the testing error by 41.57%. For R 2 score, both PI-RNN and RNN achieve comparable performance, and PI-RNN has a slightly better R 2 score than RNN.

Out-of-sample estimation performance: RNN versus physics-informed RNN (PI-RNN).
To evaluate the out-of-sample performance, we employed 4-fold cross validation whereas 4 cases were set up as listed in Table 2, the left three columns. For each case, 3 of 4 devices were selected for training and 1 for testing. We first evaluate the impact of the monotonic decreasing condition, then the impact of the boundary condition, and finally the impact of both conditions.
Influence of monotonic decreasing condition. Four groups of experiments were conducted by tweaking the parameter α for weighing the monotonic decreasing condition to tune its contribution to the total loss function. We trained the PI-RNN with α = 0, 0.1, 0.3, 0.5, 0.7, 0.9 and the results of four cases are shown in Fig. 7. Note that, when α = 0 , it means the absence of the monotonic decreasing constraint, thus the model is RNN. www.nature.com/scientificreports/ For all four cases, the slope of the predicted RUL becomes flatter as α increases. Without the monotonic decreasing constraint ( α = 0 ), the predicted RUL by RNN could go up which is impossible in real life. However, when increasing the weight of the monotonic decreasing constraint, i.e., increasing the value of α , we could eliminate the spikes on the curves, which are marked with the red circles in Fig. 7.
Influence of boundary condition. The boundary condition intends to limit the predictions within the value range from 0 to 1. PI-RNN with only boundary condition but without monotonic decreasing condition ( α = 0 , β = 100 ) is compared with the original RNN in Fig. 8. In Cases 2, 3, and 4, as the RUL predicted by the original RNN always ranges from 0 to 1, the boundary condition cannot make much difference. This is expected because the underlying boundary condition is already fulfilled. However, In Case 1, the original RNN predicts some RUL values less than 0 near the end of its lifetime, which is contrary to the boundary condition. After applying  www.nature.com/scientificreports/   www.nature.com/scientificreports/ the boundary condition constraint, the predictions converge to 0. Also, the loss on the testing is reduced by 76% from 14.99 × 10 −3 to 3.59 × 10 −3 .
Influence of both physical conditions. When applying both physical conditions, PI-RNN can improve the performance, in both MSE and R 2 score, of training and testing in all 4 cases. As shown in Table 2  We would note that, in our experiments, the RUL tends to be constant at the end. This is because the precursor signal reaches a final constant level before failure. To decide whether the device is reaching its end of lifetime, we may consider a refined strategy to (1) differentiate and determine multiple device health stages, e.g., healthy, sub-healthy, pre-failure, and failure; (2) When the pre-failure health stage is reached, a polynomial model may be fitted instead to estimate RUL for the rest of lifetime. Since this work focuses on PINN for RUL estimation, this can be a direction for future investigation.
Out-of-sample estimation performance: LSTM versus physics-informed LSTM (PI-LSTM). As a more powerful variant of RNN, LSTM has also been proposed for RUL estimation 3,7 . To show that our PINN formulation can also work well with LSTM, we evaluated the out-of-sample estimation performance of pure www.nature.com/scientificreports/ LSTM and physics-informed LSTM (PI-LSTM) with the same 4-fold cross-validation described in the previous subsection. The structure of LSTM is similar to that of RNN in the previous experiments: 1 neuron in the input layer, 80 LSTM cells in the first hidden layer, 10 neurons in the second hidden layer, and 1 neuron in the output layer. Due to the additional three gating mechanisms (forget gate, input gate, and output gate) in an LSTM cell, it has four times the number of weight/bias parameters of a corresponding vanilla RNN unit. As a result, the total number of parameters in the LSTM is approximately four times as many as that of the RNN (27,061 parameters in the LSTM and 7,381 parameters in the RNN). Both physical conditions were applied in the physics-informed LSTM with the parameters α = 0.1 and β = 100 . The MSE and R 2 score are shown in Fig. 10.
After introducing two physical conditions, the performance of the LSTM is increased in both MSE and R 2 score with only one exception, a mere 1% MSE rise of testing in Case 1. For all four cases, the MSE is improved on average by 15.3% from 4.813 × 10 −3 to 4.074 × 10 −3 in training and 13.9% from 5.514 × 10 −3 to 4.746 × 10 −3

Conclusion
We have proposed an RUL estimation method for IGBTs using PINN. The physical rules are identified from the target RUL function and formulated as two regularization terms (monotonic decreasing and boundary conditions) in the loss function of the underlying NN. By adjusting their weighted importance, the regularization makes the trained neural network conform to a monotonic decreasing trend and removes negative values. We have applied our method to RNNs for RUL estimation using the NASA IGBT data set. In the in-sample estimation experiments with vanilla RNN, our physics-informed RNN can improve the MSE of the baseline underlying RNN on average by 38.86% in training and by 35.69% in testing. In the out-of-sample estimation experiments with vanilla RNN, our physics-informed RNN can improve the MSE of the baseline underlying RNN on average by 24.7% in training and by 51.3% in testing. In the out-of-sample estimation experiments with LSTM, our physics-informed LSTM can improve the MSE of the baseline underlying LSTM on average by 15.3% training and 13.9% in testing. This implies a large expansion of the NN models' generalization capability. In both in-sample and out-of-sample estimation, PINN does not compromise R 2 score. Actually, it slightly enhances R 2 score in all cases. Our approach opens a new path for RUL estimation by combining data-driven with physics information, and perhaps more significantly, it can be inspiring for expanding PINN to address other non-mathematical real-life problems that need to identify and formulate physical rules into the underlying NN's loss function for regularization. www.nature.com/scientificreports/

Data availability
The dataset used and analyzed during the current study is available in the NASA Prognostics Center of Excellence (PCoE) Data Set Repository, Data Set 8 on Insulated-Gate Bipolar Transistor (IGBT) Accelerated Aging. https://www.nasa.gov/content/prognostics-center-of-excellence-data-set-repository.